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Abstract 

We  establish  the  mean  square  consistency  of  running  (ordinary)  least  squares  linear  re¬ 
gression  smoothers,  under  realistic  conditions  on  the  joint  distribution  of  the  abscissa  and 
ordinate  (X  and  Y  below)  variables.  The  windows  used  in  the  running  least  squares  fits  need 
not  be  centered  on  the  points  for  which  they  are  used.  In  fact,  we  show  that  taking  a  window 
of  points  entirely  to  one  side  of  a  data  point,  fitting  a  line  to  that  window  and  tismg  the  value 
of  that  line  at  the  target  point  is  consistent.  It  follows  that  the  Supersmoother  of  Friedman 
and  Stuetzle  (1982)  and  the  Split  Linear  Smoother  of  McOon2dd  and  Owen  (1984)  are  both 
consistent. 
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1.  Running  Linear  Smoothing. 

Given  observations  {Xi,Yi)  €  ,  1  <  t  <  n  a  running  linear  smooth  value  at  xis  a  +  fix  where 

a  and  0  are  regression  coefficients  from  a  linear  regression  of  Y"  on  X  over  a  set  of  observations 
indexed  by  J{x). 

In  practice  J{Xi)  t3rpically  consists  of  the  union  of:  the  indices  of  the  smallest  k„/2 
points  in  {Xj  :  Xj  >  X,-},  the  indices  of  the  largest  kn/2  points  in  {Xj  :  Xj  <  X,-}  and 
{t}  itself,  with  sensible  modifications  to  handle  ties  and  end  effects.  Furthermore  J{x)  is  not 
usually  calculated  for  «’s  that  do  not  correspond  to  sample  points,  it  being  more  expedient  to 
interpolate  if  necessary.  In  this  paper  it  is  more  convenient  to  define  the  smoother  at  all  points 
without  resort  to  interpolation.  The  above  describes  a  central  running  linear  smoother,  the 
adjective  ‘central’  serving  to  distinguish  it  from  one  sided  smoothers  in  which  J (x)  consists  of 
the  kn  nearest  neighbors  of  x  on  the  left  (or  right). 

The  Supersmoother  of  Friedman  and  Stuetzle  (1982)  combines  several  central  smoothers, 
differing  only  in  the  value  of  kn-  Its  design  goal  is  to  make  more  use  of  the  siller  windows 
in  regions  of  X— space  where  the  ciirvature  of  the  regression  of  V  on  X  seems  large  relative  to 
the  variance  of  Y  and  to  emphasize  the  larger  windows  where  the  curvature  is  smaller,  so  as 
to  locally  trade  off  bieis  versus  variance. 

The  Split  Linear  Smoother  of  McDonald  and  Owen  (1984)  combines  central  smoothers 
with  left  zmd  right  sided  smoothers.  The  design  goal  is  to  provide  an  edge-detecting  smoother 
that  produces  output  that  is  piece-wise  smooth  with  a  small  number  (possibly  zero)  of  discon¬ 
tinuities  in  the  curve  or  its  first  derivative.  It  does  this  by  taking  a  weighted  average  of  the 
smooths  at  each  point;  near  a  discontinuity  it  uses  larger  weights  for  the  windows  that  extend 
in  the  direction  opposite  the  discontinuity. 

Because  these  smoothers  are  used  as  building  blocks  in  non-par ametric  regression  tech¬ 
niques  such  as  projection  pursuit  regression  (Friedman  and  Stuetzle  (1981))  and  A.C.E.  (Brei- 
man  and  Friedman  (1984))  proofs  of  their  consistency  have  ramifications  beyond  smoothing. 

Stone  (1077)  shows  that  linear  fits  over  sets  of  nearest  neighbors  are  consistent  when 
trimmed.  The  nearest  neighbor  linear  fits  can  be  expressed  as  a  weighted  average  of  the  Y 
values  in  the  neighbor  set.  Trimming  involves  adjusting  those  weights  if  necessary  to  make 
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sure  that  their  ratios  to  the  weights  of  some  consistent  estimator  are  uniformly  bounded  above 
and  below.  The  consistent  estimator  may  be  taken  to  be  a  nearest  neighbor  average.  He  states 
that  linear  fits  to  nearest  neighbors  are  not  necessarily  consistent. 

Breiman  and  Friedman  (1982)  show  that  a  modified  central  running  linear  smoother 
is  consistent.  The  modification  is  greatest  at  the  end  points  of  the  sample  where  a  lack 
of  observations  makes  it  impossible  to  form  the  usual  symmetric  nearest  neighbor  window. 
Unfortunately,  the  main  reason  for  using  linear  as  opposed  to  constant  fits  is  to  reduce  bias 
at  the  ends.  Their  modification  would  be  severe  for  a  one  sided  smoother  that  effectively 
treats  every  point  as  an  endpoint.  (There  are  reasons  other  than  consistency  for  making  the 
modification.  They  need  a  smoother  that  is  a  boimded  linear  operator  on  the  observed  Y 
values  whatever  the  X  values,  and  the  bound  must  be  uniform  in  the  sample  size.  Rmming 
linear  smooths  (central  or  sided  as  defined  here)  have  boimds  that  increase  as  the  jquare  root  of 
the  window  size.)  Rather  than  changing  the  definition  of  the  running  linear  smoother  we  place 
additional  restrictions  on  the  distribution  of  the  observations.  Fortimately  the  restrictions  are 
realistic  for  applications. 

By  restricting  the  distributions  we  do  not  establish  what  Stone  (1977)  calls  imiversal 
consistency.  He  also  obtains  U  consistency  for  all  r  >  1  such  that  the  moment  of  Y  is 
finite,  whereas  we  only  consider  L^. 


2.  Notation. 

This  section  introduces  the  notation  and  defines  a  left  sided  running  linear  smoother. 

The  observations  Eire  a  (finite  prefix  of)  an  infinite  sequence  of  i.i.d.  random  variables 
(X,-,y,),  1  <  *  <  00.  X  and  Y  represent  the  complete  sequences.  The  r**  order  statistic 
among  the  first  n  terms  of  X  will  be  denoted  and  similarly  for  Y.  The  first  n  terms  will 
be  collectively  denoted 

For  each  x  E  R  Eind  each  positive  integer  n  define  Jn{x),  the  n**  window  about  x  as 
follows:  if  a;  <  or  z  >  then  Jn{x)  =  O,  if  z  G  then  Jn{x)  consists  of  the  greatest 
kn  terms  of  that  are  less  than  or  equal  to  x  (if  there  are  not  k„  such  terms  take  all  such 
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terms  and  if  a  tie  need  be  broken  take  the  term(s)  with  sm2illest  index),  otherwise  take  the 
smallest  term  in  that  is  greater  than  or  equal  to  x  and  the  ifc„  —  1  largest  other  terms  that 
are  less  than  or  equal  to  x  with  the  obvioiis  handling  of  ties  and  shortfeills  .  For  the  typical 
point  X,  Jn(x)  consists  of  its  nearest  neighbor  on  the  right  and  its  k„  —  1  nearest  neighbors  on 
the  left.  For  any  observation  in  the  window  Jn(Xi)  has  no  points  strictly  to  the  right 
of  Xf.  By  construction,  except  in  the  case  Jn(x)  is  void 


min  Xi  <  X  <  max  X,-. 

•€/n(»)  «€/n(») 

The  smooth  value  at  x  based  on  the  first  n  observations  is  denoted  mn(x,X,y).  As 
written,  it  depends  on  the  whole  sequences  X  and  Y,  but  it  will  really  only  depend  on  the  first 
n  terms  of  them. 


If  J„(x)  is  empty,  then  take  mn(x,X,Y)  =  0.  Otherwise  compute 


An  (a:) 


1 


E 


and 


1 


-  An(a!))^ 

»€/n  (») 


The  left  sided  rmming  linear  smooth  value  is 


m* 


where  for  any  to,  «;  is  0  if  &n{x)  =  0  and  otherwise 

w  -  ii„(x) 


w  = 


^n{x) 

Below,  the  dependence  upon  x  and  n  ol  J ,  a  and  A  is  sometimes  suppressed. 

The  following  identity  will  often  be  convenient 

e;(j + iXif = i/i(i + i=). 

»€/ 

The  quantity  x  is  the  distance  of  the  target  point  from  the  window  meS  expressed  in 
window  stand2ird  deviations.  A  large  magnitude  indicates  that  the  target  point  is  not  well 
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represented  by  the  window  set  and  this  will  be  reflected  in  bias  and  variance  expressions 
below.  The  construction  of  J  gaurantees,  by  Chebychev’s  inequality,  that  <  k„.  Breiman 
and  EViedman’s  (1982)  modified  running  linear  smoother  truncates  x  to  ±1  when  it  exceeds  1 
in  absolute  value.  They  also  note  that  |i|  <  1  whenever  (in  a  central  window)  there  are  equal 
niimbers  of  points  greater  than  and  less  than  x.  The  construction  above  is  in  effect  tnmcating 
X  at  which  gives  less  control,  but  obviates  the  need  to  modify  the  smoother  within 

the  range  of  the  observed  Xi.  For  convenience  of  exposition  the  smoother  is  zero  outside  that 
range,  although  there  would  be  no  difficulty  in  extrapolating  by  extending  the  smooth  values 
at  the  left-  and  right-most  sample  points  to  the  left  and  right  of  the  sample  respectively.  If  x  is 
an  atom  of  C,{Xi)  then  — »  0  almost  surely.  Otherwise,  imder  reasonable  sampling  conditions 
Lemma  2  in  section  5  shows  that  for  a  one  sided  smoother  2^  — ►  3  in  probability. 


3.  Main  Result. 

This  section  treats  the  pointwise  mean  square  consistency  of  running  linear  smoothers,  which 
means  the  convergence  of  m„(2,  X,  V)  to  m{x).  Several  technical  Lemmas  proved  in  section 
5  are  used.  The  following  assumptions  will  be  used  to  prove  consistency  of  numing  linear 
smoothers: 

(I)  Xi  are  iid  from  distribution  F  =  Fj+  Fae  where 

a)  F  has  support  on  [0, 1]. 

b)  Fae  has  a  continuous  positive  density  on  [0,  l|. 

c)  Fj  has  a  finite  number  of  jumps. 

(II)  Yi  are  conditionally  independent  given  X  with  V(yi|J?')  <  tr'^  for  some  <  oo. 

(Ill)  If  m(x)  =  E{Y\X  =  x)  then  for  all  but  finitely  many  points  x  3M(x)  <  oo  such  that 
|m(2)  -  fn{x')\  <  M(x)  \x  —  x'j. 

The  first  assumption  constrains  the  distribution  of  the  X’s.  For  technical  reasons  the 
support  of  these  random  variables  has  to  be  compact  and  have  an  absolutely  continuous  part 
with  density  continuous  and  bounded  away  from  zero.  Jump  points  are  allowed,  although 
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there  can  not  be  an  accumulation  point  of  the  jumps  of  the  distribution.  The  uniform  bound 
on  the  conditional  variance  of  the  Y  random  variables  given  the  X  variable  is  weaker  than 
the  standard  homoscedasiticy  assumption  but  stronger  than  the  assumption  that  Y  E  L^. 
Condition  (III)  is  unusual  in  that  it  allows  simple  jiunp  discontinuities  in  the  regression. 


Theorem  If  conditions  (I)  -  (III)  hold,  kn  —*  oo,  and  k^fn^  — »  0  then  {fitn}  is  mean  square 
consistent  at  x  for  almost  all  x. 

Proof:  Define  5  =  {a  :  >  0  P(a  <  X  <  ®  +  ^)  >  0,  P(®  —  5  <  X  <  *)  >  0,  and 

either  P(X  =  a)  >  0  or  condition  III  holds  }.  From  the  hypothesis  it  is  easy  to  show  that 
P(X  €  5)  =  1.  FVom  the  triangle  inequality 

{E(m(a;)  -  m„(x,X,r))"}»  <  {E(m(x)  -  m„(x,X,m(A:)))"}» 

+  {E(m„(x,X,m(X))  -  m^(x,X,Y)fY^ ,  (1) 

where  the  first  term  on  the  right  hand  side  represents  bias^  and  the  second  term  represents 
variance.  m(X)  is  the  sequence  1  <  i  <  oo.  It  is  sufficient  to  show  both  terms  converge 

to  zero.  First  the  variance  term  is  considered,  using  Ex(  •  )  to  denote  conditional  expectation 
given  the  X-sequence. 


”  •€/ 

.€/ 

+  Y,  Ex(  [Yi  -  m(:t.))(y*  -  m(X*))  )(l  +  xXi)(l  +  xXu)  ) 

itk 

= Ae(  Ex(  (Vi  -  m(X,))=  )(1  +  iXif  )  (a) 

i€/ 

T 
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=E( 


(rVnl  l  +  i= 


Equation  (a)  is  a  consequence  of  the  conditional  independence  of  the  I'i’s  given  the  X,’s.  (b) 
follows  from  hypothesis  (II)  and  (c)  from  the  identity  given  in  the  first  section.  From  Lemma 
2  it  follows  that  the  last  line  converges  to  zero  by  the  dominated  convergence  theorem. 

The  bias^  term  is 

E(  (m(*)  -  m„{x,  X,  m{X)))^  )  =  E(  f  )" 

•€/ 

s  A  Ew*)  -  “W))' E(i + *-^<)" )  w 

i€/  ieJ 

=  E(i^  4^E(’"W-'"W))M  w 

n  n 

<  2E(  -  m(A-.))"  )  (c) 

i€J 

where  (a)  is  a  consequence  of  the  Cauchy-Schwarz  inequality,  (b)  is  yet  another  application  of 
the  identity  of  the  previous  section,  and  (c)  follows  from  Lemma  2.  The  last  term  converges 
to  zero  by  Lemma  3. 

Hence  for  all  x  E  S,  the  estimate  is  mean  square  consistent. 


4.  Further  Results. 

This  section  extends  the  above  result  to  central  and  right  sided  smoothers,  discuss  consistency 
imder  design  measures  on  X,  rates  of  convergence  and  global  convergence. 

Corollary  1  Under  the  conditions  of  Theorem  1,  right  sided  and  central  smoothers  are 
pointwise  mean  square  consistent. 

Proof  For  right  sided  smoothers,  the  result  follows  by  symmetry.  For  central  smoothers 
define  Jn{x)  as  for  a  left  sided  smoother,  J^{x)  cis  for  a  right  sided  smoother  md  put  Jn{x)  = 
Jni^)  —  (®)  *-•  The  quantity  x^  for  the  central  smoother  will  be  smaller  than  nine 

times  the  largest  such  quantity  from  the  one  sided  smoothers,  and  {x  —  fi)^  will  be  no  larger  for 
the  central  smoother  than  the  largest  such  value  from  the  sided  smoothers.  (The  first  bound 
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is  obtained  by  straightforward  calculation  and  is  very  conservative.)  It  follows  from  the  bias 
and  variance  bomids  above  that  the  central  smoother  is  mean  square  consistent  at  all  z  €  5. 

Corollary  2  Under  the  conditions  of  Theorem  1,  any  smoother  that  at  each  point  is  a 
convex  combination  of  central,  left,  and  right  sided  smoothers  is  pointwise  mean*square  con* 
sistent.  This  includes,  with  appropriate  window  sizes,  the  Supersmoother  and  the  Split  Linear 
Smoother. 


Proof  Immediate. 

Remark  1  Pointwise  consistency  implies  pointwise  convergence  in  probability,  which 
when  established  for  almost  all  points  (i.e.  all  z  G  S)  implies  convergence  in  probability  of 
^n(-Xo)  to  m(Jfo)  where  Xq  is  independent  of  X  and  has  the  same  distribution  as  Xi. 


Remark  2  Instead  of  observing  i.i.d.  pairs  (X,-,  3^-),  consider  choosing  X  according  to  a 
design  measure  on  the  sequence  and  observing  Y  whose  terms  are  conditionally  independent 
given  X,  and  satisfy  the  distributional  assumptions  as  above.  Then  E(  (1  -f  x^)fk„  )  — ►  0  is 
sufficient  to  guarantee  that  the  varizince  term  will  vanish  as  n  — ♦  oo.  If  there  is  a  positive 
minimum  conditional  v^u•iance  of  Y  given  X  =  x,  then  E(  (1  +  z^)/ife„  )  -♦  0  is-dso  necessary 
for  the  vari^lnce  term  to  vanish.  Similarly  the  bias  term  can  be  controlled  by  design.  If  all  the 
Xi  are  sufficiently  uniformly  spaced  then  z*  will  be  boxmded  in  z  and  n. 


Theorem  2  (Global  convergence)  Under  the  conditions  of  Theorem  1,  and  if 
|m(z)  —  m(z')|  <  M|z  —  z']  the  central  smoother  satisfies  mn{Xo,X,Y)  —*■  m(Xo)  in 
where  Xq  is  independent  of  X  and  has  the  same  distribution  as  Xi. 


Proof  Let  Z  be  an  indicator  variable  which  is  1  when  Xq  <  1.  Then  the  variance  term  is 
bounded  by 

2(7^ 

k„ 

.2^  _  ..fn) .  _ _ _ 


-I-  2er^E{  1-Z) 

,2/d/V-  ^  v(")l 


2^^iP{Xo  <  +  P(Xo  >  Xl2L,^)) 

k„  n 

-‘O, 
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(where  is  defined  in  section  2),  and  the  bias^  term  is  bounded  by 


-0, 


where  is  the  ***  order  statistic  out  of  n  independent  uniform  random  variables  and  7  is 
the  same  quantity  found  in  the  proofs  of  the  Lemmas. 

Remark  3  Notice  that  the  bounds  above  imply  the  squared  error  can  be  made  to  converge 
at  rate  by  letting  k„  grow  at  a  rate  slightly  slower  than  The  optimal  global  rate 

of  convergence  asstuning  one  derivative  is  (Stone  (1982)).  This  proof  also  goes  through 

for  the  sided  smoothers  except  that  Z  must  indicate  that  JCq  <  B  for  some  B  >  3.  (Recall 
3  in  probability.)  It  can  be  shown  that  P(x^  >  B)  -*  0  at  least  as  feist  as  Jfe~^  and  so 
the  same  rates  obtain  for  the  sided  smoothers  as  for  central  ones.  However,  the  main  use  of 
sided  windows  is  for  situations  in  which  it  is  suspected  that  there  is  a  discontijmity. 

Remark  4  Compared  to  linear  fits  over  nearest  neighbor  windows,  the  central  smoother  is 
based  on  points  farther  away  from  the  target.  On  the  other  hand,  a  linear  fit  over  the  Jk„ 
nearest  neighbors  puts  no  bound  at  all  on  since,  in  the  worst  case,  all  the  neighbors  C2in 
be  in  a  clxister  on  one  side  of  the  target  point.  Using  symmetric  nearest  neighborhoods  x^ 
is  bounded  by  for  all  points  and  by  1  for  most  points.  (The  former  was  handy  in  some 
dominated  convergence  arguments.)  Thus  in  addition  to  being  faster  to  compute,  linear  fits 
over  symmetric  nearest  neighborhoods  are  safer  than  those  over  nearest  neighborhoods. 

Remark  5  The  Split  Linear  Smoother  and  the  Supersmoother  were  designed  to  meet  specific 
finite  sample  gosils.  We  believe  those  goals  to  be  more  important  than  asymptotic  behavior. 
The  point  of  this  paper  is  to  show  that  without  any  modification  in  the  observed  range  of  X 
the  attainment  of  finite  sample  goals  is  not  at  undue  asymptotic  expense. 
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5.  Proof  of  Lemmas. 


Conditions  stated  in  section  3  are  assumed  to  hold.  The  main  idea  driving  the  following 
Lemmas  is  that  under  the  distributional  assumptions  (I)-(III)  the  observations  Xi  “sufficiently 
close”  to  X  behave  like  observations  from  a  uniform  distribution.  Once  this  correspondence  is 
established  exact  calculations  can  be  made  for  the  uniform  variates  in  a  way  that  gives  bounds 
for  the  quantities  of  interest.  Lemma  1  makes  precise  the  sense  in  which  the  points  of  Jn(x) 
get  “sufficiently  close”  to  x.  Lemma  2  provides  a  construction  that  bounds  the  small  order 
statistics  of  the  x  —  X’s  by  the  order  statistics  of  a  uniform  distribution.  This  construction  is 
then  exploited  by  Lemmas  2  and  3. 

By  the  definition  of  J(x)  there  may  exist  an  element  of  J(x)  to  the  right  of  x.  In  Lemmas 
2  and  3  that  element  allows  the  application  of  Chebychev’s  inequality.  However,  explicit 
consideration  of  that  point  would  add  unnecessary  complexity  involving  quantities  of  0(1/Jfe„) 
to  the  following  proofs.  Therefore  with  the  exception  of  the  appeal  to  Chebychev’s  inequality 
that  point  is  not  considered.  The  reader  is  invited  to  make  the  necessary  (minor)  alterations 
to  include  the  point  if  so  motivated. 

Lenuna  1:  Vx  6  S,  max,g/^(,)  \x  —  JC,|  0. 

Proof:  Let  T”  =  —  If m  <  Xj  <  x}.  Prom  definition,  >  kn  implies 

max,-g/^(a)|3  —  -X,|  <  1/m.  fVom  the  law  of  large  niunbers  P(x  —  1/m  <  X  <  x) 

which  is  positive  by  hypothesis.  Since  ^  0,  there  exists  a  null  set  Nm  such  that  on  lim 

sup  {max,e/^(,)  |x  —  .^,1}  <  1/m.  The  results  then  holds  on  (Um=i  ^m)^- 


Lemma  2:  Vx  G  S,  ^{1  +  x^)  is  uniformly  bounded,  with  j^(l  +  x^)  0. 

Proof:  Chebychev’s  inequality  yields  x^  <  k„.  If  x  is  an  atom,  then  by  definition  and 

the  law  of  large  numbers,  x^  — ►  0.  Hence  without  loss  of  generality  it  may  be  assumed  x  is 

p 

not  an  atom.  In  the  terminology  of  section  2  it  is  sufficient  to  show  x^  — 3,  or  equivalently, 
(^n+An)/An  ^  |- 


Let 


x-Xi  +  [F{Xi)-F{XT)]r,j 
Xj  +  [f’(Xy)  -  P(Xr)]  rii 


if  Xj  <  X 
otherwise 
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where  f/y  is  an  independent  sequence  of  uniform  random  variables.  Define  F*(t)  =  P{Xj  <  t). 
Then  F*  has  a  density  bounded  away  from  zero  on  its  support.  Further,  for  j  G  Jn{x), 
X]>x-Xj. 

From  hypothesis  3r  >  0  such  that  F  is  continuous  on  [x  —  r,x].  Denote  the  density  of  F*  by 
/*.  Pick  e  >  0.  Then  35  G  (0,  r)  such  that  M  <  (1  +  c)m,  where  Af  =  max,g[o^^)/*(2r),  and 
m  =  min*g|o,4)/*(^).  Define 

rF*(0,  ift<5; 

G{t)  =  I  F*{6)  +  rm  -6)  S<t<S  +  ^[1  -  F*(5)]; 

'  1  otherwise. 

and  set  Zj  =  o  F*{Xj),  Zj  has  ctimulative  distribution  function  G,m<  <  M, 

and  Zj  —  X  —  Xj  for  Xj  G  (35  —  a:].  Let  Zyj  denote  the  order  statistics.  FVom  Lemma  1 

k  P 

^  {x  —  ^  0  as  n  — ►  00  .  Thus  it  suffices  to  show 

h  P  * 

Set  =  G(Z(,j).  Then  {^^(»)}  are  distributed  as  the  order  statistics  froin_a  \mifonn  dis¬ 
tribution.  From  construction  of  the  Z’s  Jind  Taylor’s  theorem,  ?/(,•)  =  =  G(0) 

^(i)  ^G(a)|,=q  =  /  •  Z^i),  where  rj  G  [0, and  m  <  ^G(i)|i=,,  —  I  <  M.  To  obtmn  an 
upper  boxmd  of  the  limit  one  applies  simple  algebra  to  yield 

^  (mV 

^  ^  (1  +  0"  (a) 

Since  the  random  variables  {f^(y)/f^(ifc„+i)}  are  distributed  according  to  the  ordered  statis¬ 
tics  of  a  imiform  distribution  (a)  follows  from  a  simple  variant  of  the  weak  law  of  large  numbers. 
A  symmetric  argument  yields  the  lower  bound.  Since  c  is  arbitrary  the  result  follows. 
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Lemma  3:  Va:  €  S,  E2,gj(m(a!)  -  m(X,))^  =  0{Jfe^/n^). 

Proof:  Keep  the  same  notation  and  construction  of  Lemma  2.  Define  Wj-  =  F*{Xj).  Then 

Wj  are  uniformly  distributed  and  from  the  mean  value  theorem,  Wj  =  Xj  f*{r})  for  some 
rj  e  [0,^].  Let  7  =  inf,/*(x).  Prom  hypothesis  III  3M(x)  such  that  |»n(x)  — 

M{x)  \x-X^j)\.  Thus 


E  J^(m(x)  -  m{Xj))^  <  Af2(s)E  -  Xjf 
»€/  •€/ 


< 


< 

< 


MHx)Ej2{^f 

*€/ 

'  i=i 

7^  n2 


(a) 


Step  (a)  is  a  consequence  of  the  fact  that  is  the  order  statistic  fronr-r»  uniformly 
distributed  random  variables  and  consequently  is  distributed  according  to  a  Beta  distribution 
B{j,n  +  1  -  j). 
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